Clinical and allelic heterogeneity in dystrophic epidermolysis bullosa- lessons from an Indian cohort

Background Dystrophic epidermolysis bullosa (DEB) is due to variation in the COL7A1 gene. The clinical phenotype and severity depends on the type of variation and domain of the affected protein. Objectives To characterize the spectrum of COL7A1 variations in a cohort of DEB patients from India, to correlate these findings with clinical phenotypes and to establish a genotype-phenotype correlation. Methods This was a retrospective, observational study involving patients with DEB diagnosed on the basis of clinical manifestations, Immuno-fluorescence antigen mapping (IFM) and genetic analysis. A genotype-phenotype correlation was attempted and observations were further explained using IFM on skin biopsies and molecular dynamic simulations. Descriptive statistics were performed using SPSS version 20.0 with P values of <0.05 considered significant. Results We report 68 unrelated Indian DEB patients classified as RDEB-Intermediate (RDEB-I), RDEB-Severe (RDEB-S) or DDEB based on the EB diagnostic matrix, immunofluorescence antigen mapping and genetic data. Of 68 DEB patients, 59 (86.76%) were inherited in a recessive pattern (RDEB) and 9 (13.24%) in a dominant pattern (DDEB). Limbal stem cell deficiency was seen in four cases of RDEB-S very early in the course of the disease. A total of 88 variants were detected of which 66 were novel. There were no hotspots and recurrent variations were seen in a very small group of patients. We found a high frequency of compound heterozygotes (CH) in RDEB patients born out of non-consanguineous marriage. RDEB patients older than two years who had oral mucosal involvement, and/or deformities, were more likely to have esophageal involvement. Genotype phenotype correlation showed a higher frequency of extracutaneous manifestations and deformities in patients with Premature Termination Codons (PTCs) than in patients with other variations. Molecular simulation studies in patients with missense mutations showed severe phenotype when they were localized in interrupted regions of GLY-X-Y repeats. Conclusion This large study of DEB patients in South Asia adds to the continually expanding genetic database of this condition. This study has direct implications on management as this group of patients can be screened early and managed appropriately.


Introduction
Epidermolysis bullosa (EB) is a group of inherited blistering disorders with clinical and genetic heterogeneity. There are four major types: EB simplex (EBS), junctional EB (JEB), dystrophic EB (DEB) and mixed EB (Kindler EB) [1,2]. DEB is further classified as dominant DEB (DDEB) and recessive DEB (RDEB) based on the pattern of inheritance. It is caused by variations in COL7A1, which encodes type VII collagen. COL7A1 is one of the largest genes with about 118 exons [3]. While DDEB occurs due to heterozygous glycine substitutions resulting in a milder phenotype, RDEB is usually secondary to substitutions, nonsense, frameshift or splice site variations on both COL7A1 alleles, which leads to severe disease, with extracutaneous manifestations [4].
Type VII collagen, a major component of anchoring fibrils, has a central triple helical domain (THD) flanked by non-helical domains at each N-terminus end. The THD mainly consists of Glycine-X-Y repeat with interrupted regions [5]. Although over 700 variations in COL7A1 have been reported to cause DEB, their impact on the structure and function of the protein and on disease pathophysiology are poorly understood [6]. Apart from the genetic make-up, other factors like environment, presence of mutagens, diet and stress also affect the clinical phenotype.
While genotype-phenotype correlations in DEB have been described in various ethnic populations across the world, phenotypic variations amongst these groups continue to be reported [7][8][9][10][11].
One study estimated the incidence and point prevalence of EB to be 41.3 per million live births and 22.4 per million population, respectively. The authors postulated that with a high detection rate in a well-organized set-up, the incidence and prevalence may be higher than previously believed [12]. In South Asia, with over 1 billion population, it is likely that there will be large numbers of EB affected individuals. However so far the attempts to genetically characterize this cohort have been few and far in between. There is no reliable estimate of disease frequency and disease burden from South Asia in literature. With gene therapy completing multiple Phase III clinical trials in the West, there is an urgent need to identify the molecular/ genetic markers of this population. At our centre multidisciplinary EB clinics have been conducted over the past 10 years and patients with different types of EB have been referred for care and management. Here we report the clinical and genetic profile in 68 DEB patients and also attempt a genotype-phenotype correlation in them. Additionally, we have used in-silico modelling of collagen like peptides as surrogate tools to understand the effect of the variations.

Materials and methods
This was a cross-sectional, retrospective, observational study, involving DEB patients referred to our EB clinic at Centre for Human Genetics and Manipal Hospital, Bangalore, India from 2009 till 2021. Patients were enrolled into the study with informed consent from patients and assent from children where appropriate. The EB matrix was first used to clinically delineate the patients into RDEB-I, RDEB-S and DDEB subgroups, following which the 'onion-skin approach' of the consensus criteria (i.e., clinical features, immunofluorescence antigen mapping and exome sequencing) was used to make the final diagnosis [1,2]. Key clinical features, which are specific for DEB, were recorded, viz., mucosal involvement (oralblisters, ankyloglossia and microstomia; ocular pathology; gastrointestinal tract-esophageal strictures); scarring alopecia of scalp; and deformities (syndactyly and contractures). Esophageal involvement was identified based on a history of dysphagia to solids and liquids and confirmed by barium swallow whenever possible. Ocular involvement was confirmed by a pediatric ophthalmologist with experience in EB. Growth parameters were expressed in adults as BMI and in children as percentiles using the Indian Academy of Pediatrics (IAP) growth charts for boys and girls.
The diagnosis was confirmed by immunofluorescence antigen mapping (IFM) [13] and clinical exome sequencing. The variants identified on exome sequencing, including the novel variants, were confirmed by Sanger validation whenever available, for patients and parents. The effect of variations on the protein functions were predicted using ACMG clinical laboratory standards for next-generation sequencing [14]. All consecutive patients satisfying the above criteria were included for analysis. Patients with absent genetic data were excluded from the study. To understand the prevalence of COL7A1 gene variation, variant allele frequency was calculated as per the following formula: Prevalence of COL7A1 gene variation = number of times of occurrence of COL7A1 gene variation/ total number of exomes scanned.
To understand the effects of missense variations, we performed computational studies of conformation, dynamics and the stability of the protein using the sequence Q02388-collagen alpha-1(VII) chain as the wild type [15]. The parameters included for the simulation were diameter, secondary structure analysis, and hydrogen bond occupancy. The simulation parameters and protocols were as described earlier [16]. The analysis was performed using the gromacs analysis tool and visualized using the PyMol package. (GROMACS version 2020.1http://www.gromacs.org/ & The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC). The technical conditions and protocols for various genetics methods followed are uploaded in (S1 File).
Patient information was anonymized for the purpose of data analysis. Categorical variables were described using percentages, and continuous variables were described using median and interquartile range. Chi square and Fisher's exact tests were performed for categorical variables for strength of association. P values of <0.05 were considered as significant. All the analyses were performed using SPSS version 20.0. The study was approved by the Institutional Ethics Committee (Ref No CHG/077/2020-21/001-07/07/2020) and was conducted according to the Helsinki declaration. A written informed consent was obtained from all patients/parents/ Guardian. The patients in this manuscript have given written informed consent to publication of their case details.
The median age of patients was 15 months (IQR 2.75-138.75 months). Only 17 patients were older than 10 years of age. Extracutaneous manifestations including growth failure, esophageal involvement (37/59, 63% in RDEB; 0/9, 0% in DDEB), eye involvement (21/59, 36% in RDEB; 2/9, 22% in DDEB) and webbing of digits and pseudo-syndactyly (29/59, 49% in RDEB; 0/9, 0% in DDEB) were noticed, more commonly in RDEB group as compared to DDEB. The frequency and pattern of clinical involvement in different subtypes of DEB is depicted in Table 1 and Fig 1A-1H. In the RDEB group, the median age of patients who had webbing or fusion was 60 months (IQR 1-185 months) whereas the median age of patients who did not have them was 5 months (IQR 1-90.25 months). Scarring alopecia was seen in 8/59 RDEB patients and limbal stem cell deficiency (LSCD) in four patients. Genito-urinary involvement, photosensitivity or squamous cell carcinoma (SCC) was not detected in any of the patients. Fourteen patients with RDEB died in our cohort, of which definitive age at death was available for 4 patients (10 days, 4 months, 1 year 8 months and 15 years) and estimated age at death was available for another 4 patients (8 years, 9.5 years, 15.5 years, 20 years).
We noted that 20/27 patients with RDEB-I had oral involvement as opposed to only 1/9 patients with DDEB (p value = 0.001). Out of 68 patients, we also observed that all those over 2 years of age and with oral mucosal involvement also had esophageal involvement while those without oral lesions did not have esophageal involvement either (Spearman correlation coefficient = 0.5; p value = 0.003). Similarly, a higher number of patients over two years of age with deformities had esophageal involvement than those who did not and the difference was statistically significant (Spearman Correlation coefficient = 0.6; p value = 0.000). A positive correlation was noticed between age and development of deformities (Spearman correlation coefficient = 0.3; p value = 0.003) whereas no correlation was noticed between age and development of esophageal involvement (Spearman correlation coefficient = 0.17; p value = 0.16).

Molecular analysis
Fifty-nine of 68 (86.76%) DEB patients were RDEB and 9 (13.24%) DDEB patients. A total of 88 variants were detected, 79 in RDEB and 9 in DDEB. Of these, 66 variants in 51 patients were novel. Sanger sequencing data from parents were available in 39 variants in 31 patients i.e. we were able to confirm segregation in 31/51 patients with novel variants (S1 Table). The number of recurrent variations in our cohort were only 9 out of the 88 variants (10.22%) and no hotspots were observed. The variations and their frequency in various subtypes of DEB along with the domains involved are shown in Fig 2A and 2B respectively. The compound heterozygous variants were grouped separate from homozygous variants. We also studied the effect of all the 88 variants in our cohort, which were classified according to the ACMG guidelines as 'pathogenic', 'likely pathogenic' or of 'uncertain significance'. The detailed description of individual variants in RDEB-I and RDEB-S are presented in Table 2.
We also studied the effect of glycine and non-glycine substitutions in 3 RDEB-I, 4 RDEB-S & 1 DDEB using molecular dynamic simulations (Fig 3 and Table 3). In RDEB-S, simulations were performed in missense variants G2749R, G2749E, G2461E, G2575R and in RDEB-I for G1782R, R2008L and R2063W. We also performed the simulation for G2659E which was inherited both in autosomal recessive (AR) and dominant (AD) patterns. All simulations involving substitutions were compared with the respective wild-type amino acid sequence. The diameter of the wild type ranged from 0.5-0.7nm while the mutant protein ranged from 0.6-1.1nm. The increase in diameter was higher in mutants involving the interrupted region and/or in those where glycine was substituted by arginine or glutamine. In the G2659E model, changes in diameter were more pronounced in the AR than AD pattern. With respect to changes in secondary structure, formation of beta sheets was seen in G2749R and G2461E mutants, both of which were present in the interrupted region. In the first case, the changes were persistent throughout the simulation while in the latter it was transient. This change was

PLOS ONE
reflected by changes in the dihedral angles (phi and psi) in both models, when compared to wild types. Finally, in all mutants, there was a reduction in hydrogen bond occupancy that was more pronounced in glycine substitutions than in non-glycine substitutions. Hydrogen bond occupancy indirectly reflects the stability between the three chains of Type VII collagen and typically occurs between the glycine of one chain with X of the other chain in the G-X-Y repeats.

Genotype-phenotype correlation
The detailed clinical phenotypes, IFM, genetic variations and its in-silico predictions and the domains of the protein affected in 59 RDEB and 9 DDEB patients are shown in Table 2.
Overall, PTCs were more likely to be associated with severe or RDEB-S phenotype (9/10, 90%) and compound heterozygous variations were more likely to be associated with a milder or RDEB-I phenotype (14/20, 70%). In RDEB patients, those with PTCs showed a higher
Of the 59 RDEB patients, 32 (54%) had a family history of consanguinity, 13 with RDEB-I, 17 with RDEB-S and 2 with RDEB-I/S. We observed that out of 20 compound heterozygotes in our cohort, 18 patients (90%) did not have any history of consanguinity in their parents. This prompted us to study the allele frequency of collagen VII variants in Indian population and we were able to get data for four variants (S2 Table).

Discussion
We describe the clinical subtypes, age at presentation and the spectrum of extracutaneous features in 68 Indian DEB patients from unrelated families. Our observations suggest that children older than 2 years of age, and with oral mucosal involvement and/or deformities were more likely to have esophageal involvement than those without them. This difference was statistically significant (p value = 0.003 and 0.000 respectively). Hence, it would be appropriate to investigate RDEB patients with oral mucosal involvement and deformities with concomitant swallowing difficulty with a barium swallow to rule out esophageal strictures.
It was interesting to note that the majority of patients with RDEB-I type had oral involvement as opposed to DDEB where it was absent. This finding was statistically significant and may be a clinical clue to differentiate milder forms of RDEB from DDEB. Certain clinical manifestations including scalp involvement and squamous cell carcinoma (SCC) were either seen less frequently or not seen at all (8/59 RDEB cases of scalp involvement and no cases of SCC) in our cohort. This can possibly be attributed to the fact that the majority of the patients in our cohort were less than 5 years of age (36/59 patients). Since scarring complications in DEB evolve over time, it is likely that the younger age of our cohort skewed the results towards a lower frequency of these manifestations.
We also report 4 patients with LSCD in RDEB-S and this was not associated with any specific type of variation. Vazirani et al and Cheung et al have reported EB as one of the causes of LSCD [18,19]. Previously Thanos et al have described LSCD in an 11.5-year-old boy with RDEB-S, which was successfully treated with human amniotic membrane [20]. All patients in our study also had generalized severe type of RDEB, and it may be worthwhile for the ophthalmologist to specifically look for this complication in such cases. It could be speculated that a chronic inflammatory state in the cornea in RDEB could result in this condition [21]. Larger studies supported by mouse models would help understand the condition better.
Our study confirms that DDEB is a consequence of glycine substitutions, a majority localized to the THD although del/dup and splice-site variants were also observed.
Our study had a relatively higher percentage of compound heterozygous variations (20/59; 33.89%) which came from the non-consanguineous parents (18/20, 90%). This is a significant finding and might indicate a high frequency of collagen VII mutations in our population.
Universally, RDEB-S is a result of nonsense, missense, deletions/duplications and splice site variations, the majority being localized in the THD [4,9,11,22]. Our observations confirm that nonsense variations with PTCs contributing to 30% of cases and splice-site variations, del/dup in about 10% each, all of which behaved like PTCs resulting in a premature truncation of the amino acid chain, a shortened non-functional protein and consequently absent type VII collagen staining on IFM and a severe phenotype. Missense and compound heterozygous variations contributed to almost half the RDEB-S cases and were mostly localized to the THD. THD contains Gly-X-Y repeats that are interrupted in certain areas by non-glycine residues. We attempted to explain the severity of phenotype in missense variations through molecular simulations [23]. We observed changes in diameter and secondary structure more pronounced in variations occurring in the interrupted regions and a loss of conventional hydrogen bonds, resulting in increased diameter of the triple helix. These can be explained by glycine substitutions with bulky amino acids like arginine and glutamine [24]. Taken together, these changes may reduce the ability to form trimers leading to the formation of an abnormal type VII collagen or make it more susceptible to extracellular matrix proteases, finally resulting in compromised integrity of the anchoring fibrils and hence a more severe phenotype [24][25][26][27][28]. However, these cases demonstrated partial type VII collagen assembly which was evidenced by a positive staining for the same on IFM. Also, the molecular simulations were performed on hypothetical peptide models and hence electron microscopic studies or functional assays may be necessary to validate these simulations. Six compound heterozygous patients in RDEB-S were a combination of missense, PTC, deletion and duplication variations ( Table 2). Most were localized to the THD and were eventually behaving like a PTC resulting in a severe phenotype. This was identified as an absent or reduced staining for type VII collagen on IFM and the appearance of extensive scarring and deformities in these patients.
Our study confirms that bi-allelic missense variations or in combination with either a PTC or del/dup or splice site variation could result in RDEB-I. Most of the variations were localized to either NC1 or NC2 or a combination with THD. Out of 27 patients with RDEB-I subtype, the majority had compound heterozygous (n = 14) and missense (n = 6) variations, where there was either no loss of protein or only a minor alteration of the protein. The simulation performed on the 3 RDEB-I variants also revealed that there were minor changes in the diameter and hydrogen bond occupancy, no changes were seen in secondary structures and these substitutions were away from the interruptions of the Gly-X-Y repeats all leading to a milder phenotype. This was reflected as normal or reduced type VII collagen staining on IFM. Clinically, these patients had only cutaneous involvement or very limited extracutaneous involvement. We observed that some of the patients with splice-site variations showed a milder RDEB-I phenotype. However they were mostly less than 2 years of age and a longer follow-up would help ascertain whether these patients developed more severe features.
Several studies have described genotype-phenotype correlations in DEB [3,7,8,10,11,22,[29][30][31][32][33][34]. We noted that there was a significant association between PTCs and mitten hand deformity and esophageal involvement, as compared with all other mutations combined. Similarly, mutations localized to the THD were more likely to be associated with esophageal involvement. This has implications with respect to clinical care as these children can be identified in a timely fashion and be followed up more often. More funds can be earmarked early on for them, thus enabling a more efficient utilization of resources in resource-limited settings.
This study, being a retrospective study, many of our older patients were lost to follow-up or had expired. Hence clinical phenotyping could not be up-to-date with features developing over time. Immunofluorescence antigen mapping was performed only in cases where a clinical diagnosis could not be established. Hence the expression of the mutant protein in the skin was not possible in all the patients, which would have helped in understanding the disease mechanism better. We did not use any scoring systems to assess severity. Instead, the EB diagnostic matrix was used to clinically classify DEB patients, the accuracy of which is limited below 2 years of age. Larger prospective study of DEB patients with complete clinical phenotyping would help in better genotype-phenotype correlation. This may also have implications in choosing the right candidate for gene therapy when it becomes a possibility.

Conclusion
This is a single-centre study representing a large cohort of DEB patients from South Asia, including clinical and genetic data. RDEB patients above 2 years of age with oral involvement and/or deformities were noted to have a higher frequency of esophageal involvement thus emphasizing the need for a barium swallow early in the course of the disease. Unusual manifestations in the cohort included LSCD. The study reports a higher number of patients with compound heterozygous variations indicating the prevalence of such alleles in the society. A combination of molecular simulations and IFM was used to explain severity that is fairly consistent with existing knowledge. A large number of novel mutations, and absent hotspots reflect the need to individualize genetic profiles in different ethnic groups for appropriate genetic counselling, prenatal diagnosis and gene therapy.
Supporting information S1 Checklist. STROBE Statement-Checklist of items that should be included in reports of observational studies. (DOCX) S1